Skip to content

Conversation

IgorIgnatevBolt
Copy link
Contributor

What this PR does / why we need it:

Some resources can be blocked by deletion from finalizers. To catch this and expose it to metrics, we can use the deletion timestamp metadata field.
Introduce a deletion_timestamp metric for the next resources:

  • deployment kube_deployment_deletion_timestamp
  • statefulset kube_statefulset_deletion_timestamp
  • daemonset kube_daemonset_deletion_timestamp
  • service kube_service_deletion_timestamp
  • poddisruptionbudget kube_poddisruptionbudget_deletion_timestamp

Also formatting tables in docs

How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality)

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 2, 2025
@k8s-ci-robot k8s-ci-robot requested review from logicalhan and mrueg June 2, 2025 11:42
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 2, 2025
@IgorIgnatevBolt IgorIgnatevBolt marked this pull request as ready for review June 2, 2025 11:43
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 2, 2025
@IgorIgnatevBolt IgorIgnatevBolt force-pushed the feat-deletion-timestamp-resources branch from 63191f9 to d5bb362 Compare June 3, 2025 16:10
@IgorIgnatevBolt
Copy link
Contributor Author

All commits were squashed into one.

@CatherineF-dev
Copy link
Contributor

CatherineF-dev commented Jun 20, 2025

Hi, could you share more insights on use cases after these metrics are added?

Is it used for monitoring Kubernetes resources that are stuck in a terminating state?

@IgorIgnatevBolt
Copy link
Contributor Author

@CatherineF-dev Hi, yes, if the resource deletion process is stuck for some reason or blocked by the finalizer, deletiontimestamp metric can help to detect such a case and raise an alert for investigation.

@richabanker
Copy link
Contributor

/assign

@CatherineF-dev
Copy link
Contributor

@IgorIgnatevBolt How will we know which resource should be deleted?

@IgorIgnatevBolt
Copy link
Contributor Author

@IgorIgnatevBolt How will we know which resource should be deleted?

Maybe I misunderstood the question, but this PR is exactly about detection for such resources that were nominated by the controller manager for deletion but not deleted for some reason, eq blocked by finalizers

The controller managing that finalizer notices the update to the object setting the metadata.deletionTimestamp, indicating deletion of the object has been requested.

@IgorIgnatevBolt
Copy link
Contributor Author

Hi @CatherineF-dev, do you need any more information about PR or anything else that can help you move forward?

@dgrisonnet
Copy link
Member

/assign @CatherineF-dev
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 7, 2025
| kube_deployment_labels | Gauge | Kubernetes labels converted to Prometheus labels controlled via [--metric-labels-allowlist](../../developer/cli-arguments.md) | `deployment`=&lt;deployment-name&gt; <br> `namespace`=&lt;deployment-namespace&gt; <br> `label_DEPLOYMENT_LABEL`=&lt;DEPLOYMENT_LABEL&gt; | STABLE |
| kube_deployment_created | Gauge | | `deployment`=&lt;deployment-name&gt; <br> `namespace`=&lt;deployment-namespace&gt; | STABLE |
| kube_deployment_created | Gauge | | `deployment`=&lt;deployment-name&gt; <br> `namespace`=&lt;deployment-namespace&gt; | STABLE |
| kube_deployment_deletion_timestamp | Gauge | Unix deletion timestamp | `deployment`=&lt;deployment-name&gt; <br> `namespace`=&lt;deployment-namespace&gt; | EXPIREMENTAL |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use kube_deployment_deleted to align with kube_deployment_created?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep the pattern the same as for other resources like kube_node_deletion_timestamp or kube_pod_deletion_timestamp

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richabanker how do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah so its kube_deployment_created thats not following the _timestamp pattern :/

Looks like we have consciously made the switch to use _timestamp in the past so maybe kube_deployment_deletion_timestamp is the way to go.

Additionally does it make sense to rename the kube_deployment_created to also follow the same? How disruptive is that change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess renaming any existing metrics would be a breaking change and require a longer release process. I'd like to keep the scope of the current PR only in the current state, with new metrics only.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 14, 2025
@IgorIgnatevBolt IgorIgnatevBolt force-pushed the feat-deletion-timestamp-resources branch from d5bb362 to b188b21 Compare August 15, 2025 06:37
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 15, 2025
@IgorIgnatevBolt
Copy link
Contributor Author

rebased, conflicts solved

@mrueg mrueg added this to the v2.17.0 milestone Aug 15, 2025
This commit adds the kube_*_deletion_timestamp metric for several Kubernetes resources:
- Deployments
- DaemonSets
- StatefulSets
- Services
- PodDisruptionBudgets

The deletion timestamp metric reports the Unix timestamp when a resource
was marked for deletion. This helps with monitoring resource lifecycle
and cleanup processes.

All metrics follow the same pattern:
- Help text: 'Unix deletion timestamp'
- Type: gauge
- Value: Unix timestamp in seconds when DeletionTimestamp is set,
  otherwise the metric is not emitted

Updated documentation and tests are included for all affected resources.
@IgorIgnatevBolt IgorIgnatevBolt force-pushed the feat-deletion-timestamp-resources branch from b188b21 to 53ec1de Compare August 18, 2025 05:57
@IgorIgnatevBolt
Copy link
Contributor Author

fixed typo in the depl test

@mrueg mrueg changed the title feat: introduce deletion timestamp metric for multiple resources feat: introduce deletion timestamp metric for daemonset, statefulset, deployment, service and pdb Aug 20, 2025
@mrueg
Copy link
Member

mrueg commented Aug 20, 2025

/hold

for @CatherineF-dev to further comment.
Renamed the title to include all additional resources in there.

/lgtm

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 20, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 20, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: IgorIgnatevBolt, mrueg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 20, 2025
@mrueg
Copy link
Member

mrueg commented Aug 25, 2025

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 25, 2025
@k8s-ci-robot k8s-ci-robot merged commit 0641627 into kubernetes:main Aug 25, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants